Automatic generation of KEGG OC (Ortholog Cluster) and its assignment to draft genomes
نویسندگان
چکیده
As the number of sequenced genomes are rapidly growing, a method for automatic generation of orthologous gene clusters is needed. However, it is computationally hard to cluster a large number of genes at once. To address this problem, we have developed a heuristic method to assign gene groups from closely related organisms to an ortholog cluster in a bottom-up approach. In this method, we consider each gene subgroup as a representative gene and find their correspondence using bi-directional best hit (BBH) relations obtained from the KEGG SSDB database which stores all-vs-all Smith-Waterman similarity scores [1]. We have clustered all the genes in the KEGG GENES database [1] to generate KEGG Ortholog Clusters (OCs) which represent various aspects of the protein universe. As an application of KEGG OC, we have performed automatic gene assignment of the draft genomes which are not yet included in KEGG. Our method provides an efficient way for rapid annotation of the genes of newly sequenced organisms.
منابع مشابه
KEGG OC: a large-scale automatic construction of taxonomy-based ortholog clusters
The identification of orthologous genes in an increasing number of fully sequenced genomes is a challenging issue in recent genome science. Here we present KEGG OC (http://www.genome.jp/tools/oc/), a novel database of ortholog clusters (OCs). The current version of KEGG OC contains 1 176 030 OCs, obtained by clustering 8 357 175 genes in 2112 complete genomes (153 eukaryotes, 1830 bacteria and ...
متن کاملClassification of Protein Sequences into Paralog and Ortholog Clusters Using Sequence Similarity Profiles of KEGG/SSDB
We are constructing KEGG/OC (Ortholog Clusters) from KEGG/SSDB (Sequence Similarity DataBase) [2]. KEGG/SSDB contains exhaustive protein sequence similarity scores of completed and nearly completed genomes calculated by the SSEARCH program [3]. KEGG/OC is constructed automatically from the graph analysis of searching cliques with an appropriate definition for the profiles of similarity scores. ...
متن کاملKAAS: an automatic genome annotation and pathway reconstruction server
The number of complete and draft genomes is rapidly growing in recent years, and it has become increasingly important to automate the identification of functional properties and biological roles of genes in these genomes. In the KEGG database, genes in complete genomes are annotated with the KEGG orthology (KO) identifiers, or the K numbers, based on the best hit information using Smith-Waterma...
متن کاملBiennial Report on Carcinogens Listing/Delisting Procedure
The number of complete and draft genomes is rapidly growing in recent years, and it has become increasingly important to automate the identification of functional properties and biological roles of genes in these genomes. In the KEGG database, genes in complete genomes are annotated with the KEGG orthology (KO) identifiers, or the K numbers, based on the best hit information using Smith– Waterm...
متن کاملMBGD update 2015: microbial genome database for flexible ortholog analysis utilizing a diverse set of genomic data
The microbial genome database for comparative analysis (MBGD) (available at http://mbgd.genome.ad.jp/) is a comprehensive ortholog database for flexible comparative analysis of microbial genomes, where the users are allowed to create an ortholog table among any specified set of organisms. Because of the rapid increase in microbial genome data owing to the next-generation sequencing technology, ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2004